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Abstract. Second-order measures, such as the two-point 
correlation function, are geometrical quantities describing 
the clustering properties of a point distribution. In this ar- 
ticle well-known estimators for the correlation integral are 
reviewed and their relation to geometrical estimators for 
the two-point correlation function is put forward. Sim- 
ulations illustrate the range of applicability of these es- 



With the mean number density denoted by p, 

p 2 g(r) dV(xi)d^(x 2 ) (1) 

describes the probability to find a point in the volume 
element dV(xi) and another point in dV(x.2), at the dis- 
tance r — llxi— X2II, II- II is the Euclidean norm of a vector. 
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timators. The interpretation of the two-point correlation 
function as the excess of clustering with respect to Poisson 
distributed points has led to biases in common estimators. 
Comparing with the approximately unbiased geometrical 
estimators, we show how biases enter the estimators intro- 



The correlation integral C(r) (e.g. Grassberger & Procac 



cia 1984) is the average number of points inside a ball of 



radius r centred on a point of the distribution; hence, 



C(r) 



/ ds p Ans 2 g(s). 
Jo 



(2) 
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Peebles (1983)| , |Landy fc Szalay (1993 



We give recommendations for the 
application of the estimators, including details of the nu- 
merical implementation. The properties of the estimators 
of the correlation integral are illustrated in an applica- 
tion to a sample of IRAS galaxies. It is found that, due 
to the limitations of current galaxy catalogues in num- 
ber and depth, no reliable determination of the correla- 
tion integral on large scales is possible. In the sample of 
IRAS galaxies considered, several estimators using differ- 
ent finite-size corrections yield different results on scales^ 
larger than 20/i~ 1 Mpc, while all of them agree on smaller 
scales. 



In Appendix A we discuss other common two-point mea- 
sures. 

The correlation integral C(r) and the two-point cor- 
relation function g(r) are defined as ensemble averages. 
If we want to estimate C(r) from one given point set, as 
provided by the spatial coordinates of galaxies, we have 
to use volume averages which yield an estimator C (r) . 

Since all astronomical catalogues are spatially limited, 
i.e. the observed galaxies lie inside a spatial domain X>, we 
must correct for boundary effects. Estimators of the two- 
point correlation function including finite-size corrections 



Key 

mologj 



(1993) 



have been proposed by Hewett (1982). Davis & Peebles 
(1983)|, [Rivolo (1986)], [Landy fc Szalay (1993)|, [Hamilton 



3zapudi & Szalay (1998), and Pons-Borderia et 
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1. Introduction 

Second-order measures, also called two-point measures, 
are still one of the major tools to characterize the spatial 
distribution of galaxies and clusters. Probably the best 
known are the two-point correlation functi on g(r ) and the 



Stoyan (1998) introduced improved estimators of point 
process statistics, with special emphasis on the accurate 
estimation of the density p. 

An estimator C(r) is called unbiased if the expectation 
of C(r) equals the true value of C(r): 



E [C(r)] - C(r). 



(3) 



normed cumulant £2(7") = g(r) — 1 (e.g. Peebles 198Cl|) 



1 Throughout this article we measure length in units of 
h~ 1 Mpc, with Ho = lOO/i km s _1 Mpc -1 . 



E denotes the expectation value, the average over realiza- 
tions of the point process^]. An estimator is called consis- 
tent^, if the estimates C (r) obtained inside a finite sample 

2 We assume that the point process is stationary. 

3 For an ergodic point process an unbiased estimator is also 
consistent. 
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geometry T> from one space filling realization, converge 
towards the true value of C(r), as the sample volume \T>\ 
increases: 



D 



C{r) 



C(r). 



(4) 



We call an estimator ratio-unbiased if it is the quotient of 
two unbiased quantities. Whether such a quotient gives 
a reliable estimate must be tested . O ften this is only 
possib le with sim ulations 
Gaztar aga 1998 ) . 



see Sect. 2.5, see also Hui & 



For the comparison of a simulated point distribution 
with an observed galaxy distribution within the same sam- 
ple geometry and with the same selection effects, unbi- 
asedness (or consistency) is not a major concern. It is more 
important that the variance of the estimator is small. This 
may give tighter bounds on the cosmological parameters 
entering the simulations. 

This article is organized as follows. In Sect. || we will 
review several estimators for the correlation integral. With 
simulations of two drastically different point process mod- 
els, namely a featureless Poisson process and a highly 
structured line segment process, the variance and the bias 
of the estimators are investigated. Closely connected to 
these estimators for the correlation integral are the geo- 
metrical estimators for the two-point correlation function 
which will be discussed in Sect. |[ Some popular pair- 
count estimators for the two-point correlation function are 
considered in Sect. We derive the geometrical properties 
of the pair-counts. By comparing with the geometrical es- 
timators of Sect. |^ and with numerical examples we show 
how biases enter. We comment on the improved estima- 
tors of Stoyan fc Stoyan (1998)| in Sect. As an appli- 
cation, we investigate the clustering properties of galaxies 
in a volume limited sample of the IRAS 1.2 Jy redshift 
catalogue in Sect. ^. We conclude and give recommenda- 
tion for the application of the estimators in Sect. ||. In the 
Appendices we summarize currently used two-point mea- 
sures and discuss some details concerning the numerical 
implementation of the estimators. 



2. Estimators for the correlation integral C(r) 

Consider a set of points X — {x.;}^, x^ e I 3 , supplied 
by the redshift coordinates of a galaxy or cluster survey. 
All points Xi are inside the sample geometry T>. 



2.1. The naive estimator for C(r) 

The naive and biased estimator of the correlation integral 
Co(r) is defined by 



— 1 N 



(5) 



+ 




+ 



Fig. 1. In the naive estimator Co{r) all points are used 
as centres for the determination of A^(r). Ni(r) is under- 
estimated for points near the boundary of T>. 




Fig. 2. In the minus-estimators only points inside T>- 
are taken into account in the determination of A^(r). 



where 



iV 



N i( r )= ^2 V.rlGIXi-Xjl 



(6) 



is the number of points in a sphere with radius r around 
the point x^. 



Ia(x) = 



1 for x G A, 
for x A 



(7) 



denotes the indicator function of the set A. Co(r) is the 
mean value of iVj (r) , averaged over all points x^ . For points 
Xj near the boundary of V and for large radii r in partic- 
ular the number of points Ni(r) is underestimated, and 
Co(r) is biased towards smaller values (see Fig. ||). 

2.2. Minus-estimators for C(r) 

As an obvious restriction, only points, further than r away 
from the boundary of T> are used as centres for the cal- 
culation of Ni(r). Doing so we make sure, that we see all 
data points inside the sphere of radius r around a point 
x.j. D_ r is the shrunken window (see Fig. [2]) 



2?_ 



{y 6 V : B r (y) C V}, 



(8) 
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where B r (y) denotes a sphere of radius r centred on y, 
and 



N 



N r = £)lfo_ r (x i ). 



(9) 



yields the number of points inside T>_ r The minus-esti- 
mator Ci(r) reads: 



CAr) 



1 

N~ r 



N 

i=i 



■_ r (x,) Ni(r) forA r >0. (10) 



In the case of stationary point processes this estimator is 
ratio-unbiased (e.g. Baddeley et al. 1993| ). However, for 
large radii only a small fraction of the points is included 
as centres. Therefore, we are limited to scales up to the 
radius of the largest sphere that lies completely inside the 
sample geometry. With this estimator we do not have to 
make any assumption about the distribution of points out- 
side the window T>. This is important for the investigation 
of inhomogeneous, scale-invariant or "fractal" point dis- 
tributions. Pietronero and coworkers employed this type of 



estimator (see Appendix A and Sylos Labini et al. 1998) 



Let us introduce another variant of the minus-estima- 
tor, which also does not require any assumption about the 
missing data outside the sample window V. An unbiased 
estimator of the number density is given by 



Pi 



N 



(11) 



and an alternative ratio-unbiased minus-estimator may 
be defined by 

„ 1 N 

C 2 (r) = ^ , T lu_ r (xi) Ni(r) for |2?_ r | > 0. 

pi \ v -r\ jr{ 

(12) 

C2(r) differs from C\{r) in that we estimate the number 
density with N r /\D_ r \ instead of p\, and an estimate of 
p from a larger volume than in Ci(r) is used. This may 
be important, if the galaxy catalogue under consideration 
is centred on a large cluster. Then N r > \D-r\ pi, and 
therefore C\{r) systematically underestimates the correla- 
tion integral C(r). On the other hand, in C\{r) the same 
points are used for the determination of the numerator 
and denominat or, which empirically yields a reduced vari- 
ance. In Sect. £^ we will see that the large variance of 
C2(r) makes this estimator rather useless. 

2.3. Ripley-estimator for C(r) 

The Ripley-estimator ( Ripley 1976j ) uses all points inside 
V as centres for the counts A.;(r) (see Eq. ||). The bias in 




Fig. 3. The local weight w/(x m ,s) equals unity for the 
point x m with s = ||x m — x n ||. At the point x^ with s = 
||xi — Xj || the local weight is larger than unity. 



Co(r) is corrected with weights: 



N 



N 



i=l j = l;j^i 

X W;(x 4 , HXi-XjII) WjdlXj-XjH), 

with the local pair weight ( Ripley 1976| ) 

for dB s (xi) n V = 0. 



(13) 



(14) 



inversely proportional to the part of the spherical surface 
with radius s around the point Xi which is inside the sur- 
vey boundaries (see Fig ||). 9S s (x) is the surface of the 
sphere B s (x,) with radius s centred on x;. With o;/(x,,s) 
we correct locally for possible points at distance s outside 
the sample geometry T>. 
The global weight 



w g (s) 



W\ 



|{z e x> | 96 s (z)nP^0}| 



(15) 



was introduced by Ohser (1983). u) g (s) is inversely pro- 
portional to the volume occupied by points z 6 T> for 
which the surface dB s (z) intersects the sample geometry 
V (see Fig. ||). In typical sample geometries the global 
weight ujg(s) is equal to unity up to fairly large radii s. 
For example, Lo g (s) exceeds unity only for s > R in a 
spherical sample geometry with radius R (see Fig. ||). For 
r < max{s G R+ with |{x G V <9i3 s (x) n V ^ 0}| > 0}, 
Cs(r) is ratio-unbiased ( phscr 1983 ). 



2-4- Ohser and Stoyan estimators for C(r) 

Another estimator usi ng a weighting strateg y of point 
pairs was proposed by Ohser fc Stoyan (1981) 



C 4 (r) 



- N N 



\T>\ 



i=l j = l;j=£i 



7X>(Xi - Xj) 



if 7D(xi — Xj) > for all ||xj — Xj-|| < r. (16) 
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results with two drastically different examples of point 
processes. This also enables us to compare the variances 
of the estimators. Several analytical approaches have been 
put forward to investigate the variance of estimators for 
two-point measures. The majority of them relies on Pois- 



son or binomial processes (e.g. Ripley 198S, and Landy 



& Szalay 1993, see however Stoyan et al. 1993, and Bern 



stein 1994). A similar numerical comparison of estimators 



for two-point measures in the two-dimensional case was 



performed by Doguwa fc Upton (1989 



Fig. 4. The shaded area marks the set {x £ T> 9B s (x) n 
V ^ 0} for a spherical sample geometry T> = Br, with 
s = R + a. 



D+x 




Fig. 5. The set-covariance 7x>(x) is the volume of the 
shaded set PflPlx. 



Here the pair-weight is equal to the fraction |2?|/7x>(x), 
with the set-covariance (see Fig. ||) 



7x>(x) 



\vnv + x\ 



(17) 



T> + x is the sample geometry shifted by the vector x, 
and 7<d (x) is the volume of the intersection of the original 
sample with the shifted sample. This estimator is ratio- 
unbiased for stationary point processes; isotropy is not 
needed in the proof (Ohser & Stoyan 1981). 



Cl o|sely re lated to the estimator Cj{r) is its isotropized 



counte 



As a simple point process model showing no large-scale 
structure we study the behaviour of the estimators for a 
Poisson process with mean number density p. The mean 
value of the correlation integral is then 

C P {r)=~p—r\ (20) 

In Fig. ^ a numerical comparison of the estimators Co(r) 
to C${r) for a Poisson process with p = 200 in the unit 
cube is shown. The mean and the variance were deter- 
mined from 10,000 realizations. As expected, a strong bias 
towards lower values for large r is seen in Ca(r); the other 
estimators do not show any bias. Ci(r) is defined only 
for samples with N r > 0. Since there were samples with 
N r = for r > 0.325 within the 10,000 realizations of the 
Poisson process, Ci(r) is shown only for radii smaller than 
0.325. 

Looking at the absolute errors in Fig. ^, we see that 
the minus-estimators exhibit larger errors than the others 
in particular, C^ir) becomes useless on larger scales. The 
relative errors (the standard error per mean value) exhibit 
a "shot noise" peak for small r (see Fig. fy. All the esti- 
mators using weighting schemes show comparable errors, 
but especially for large r, the Ripley estimator C%{r) gives 
the smallest errors. 

To investigate the performance of the estimators for 
highly structured and clustered point process models, we 
study points randomly distributed on line segments which 
are themselves uniformly distributed in space and direc- 

p. 286 (see also Martinez 



part flOh ser 



fc Stoya n- 



W8f 



tion. From Stoyan et al. (1995 



N N 



\v\ 



et al. 1995 ) we obtain: 



Cs(r) 



for y5{r) > 0, (18) 
where yj5(r) is the isotropized set-covariance: 



p 4fr 3 



Pa 



for r < I 
for r > I: 



(21) 



7u(r) = 



1 

-in 



o Jo 



sin(0)d0d</> 7x>(x(r, 0, </>)). (19) 



2.5. Comparison of the estimators for C (r) 

Since the estimators for C(r) considered above are only 
ratio-unbiased, we have tested whether they give reliable 



I is the length of the line segments and ~p s is the mean 
number density of line segments; l~p sl p~j {lp~ s ), ~p denote the 
mean length density, the mean number of points per line 
segment, and the mean number density in space, respec- 
tively. A sim ilar model for the distribution of galaxies is 
discussed by Buryak fc Doroshkcvich (1996) . In Fig. || we 
compare the mean and the variance of the estimators for 
10,000 realizations of a line segment process with p = 200, 
I = 0.1 and p s = 20. 

As before, Co(r) shows a strong bias on large scales, 
but also the other estimators include some bias towards 
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smaller values. Some of the random samples showed 
N r — for r < 0.175, therefore C\{r) is given only for 
smaller radii. Comparing Fig. |8| and Fig. ^| we see that 
this clustered point distribution leads to a significantly 



larger variance (see also Stoyan 1983). The relative er- 
rors (Fig. ^) on large scales are nearly twice as large as 
in the case of a Poisson process with the same number 
density (Fig. 0). Since we are looking at a clustered dis- 
tribution, the "shot noise" peak is shifted to very small 
radii, not visible in Fig. ^[ Again, the minus-estimator 
C2(r) becomes unreliable for large r. The estimators using 
weighting schemes display a significantly smaller variance 
on all scales, whereas the Ripley estimator C^{r) gives the 
smallest variance on large scales. Simulations with differ- 
ent parameters / and pi led to the same conclusions. 

A possible explanation why C3 (r) shows a smaller vari- 
ance than Ci(r) and C^(r) (see Figs. and ||) is that the 
local weight w;(xj,s) used in C^{r) is larger than unity 
only for a point x,; with another point at distance s(< r) 
and with x,; closer to the boundary of the sample than 
s. The weight equals unity for all other point pairs. Con- 



trary, in Ci(r) and Cs(r) the corresponding weights are 
larger than unity for all point pairs. Each of these three es- 
timators is ratio-unbiased, hence, correcting for finite size 
effects, but a frequent use of weights larger than unity in- 



creases the variance. Colombi et al. (1998) calculate the 



weights used in the estimation of the factorial moments 
minimizing the variance o f the factorial moments (see also 
ISzapudi fc Colombi 199rj ). 



3. Geometrical estimators for the two point 
correlation function g(r) 

In contrast to estimators for the correlation integral, all 
estimators of the two-point correlation function using a 
finite bin width A are biased. A property similar to un- 
biasedness is that the expectation of such an estimator 
converges towards the true mean value of g (r) for A — > 0. 
We call this approximately unbiased. 

In this section we discuss estimators for two-point cor- 
relation function g(r) = 1 + which can be derived 
from the estimators for the correlation integral given in 
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0.1 0.2 0.3 




Fig. 8. A comparison of the estimators with number density Co(r) to Cs(r) for a random distribution of points with 
~p = 200 on line segments with lenght I = 0.1 and segment density ~p s = 20 in the unit box. The solid line marks 
the sample mean, the shaded area is the lcr-range estimated from 10,000 realizations, and the dashed line is the true 
Cs(r). 



Sect. by using the relation 

p 47rr 2 g(r) = p 47rr 2 (l + &(r)) - ^C(r). (22) 



dr 



3.1. The naive estimator for g(r) 

In analogy to the estimator Co(r) we obtain the naive es- 
timator g~o(r) for the two-point correlation function g(r): 



1 N 

5oW = ^E 



-J 4irr 2 A p{ 



where 



JV 



(23) 



(24) 



'(r)= ^ %,r+A](||Xj -Xj 

= JVi(r + A) - Ni(r) , 



is the number of points in the shell with radius in [r, r + A] 
around a point Xj. pi = provides an estimate of the 



mean number density p. The quotient 2iM approximates 



djVj(r) , 
dr 



a 2 — — / \ A^o d <~~ 
iirr pi g (r) — ► ^-C (r) 
dr 



(25) 



Similar to Coir), g~o(r) underestimates the two-point cor- 
relation function g(r). 



3.2. Minus-estimators for g(r) 

The minus-estimators for g(r) are defined as follows: 

N A 



9i 



1 - 

w= at E fo -(*< 



i=i 



47rr 2 A p\ ' 



47rr 2 A p\ 



(26) 
(27) 



with A r > and \"D- T \ > 0. As in Sect. 3.1 we ob- 



tain the minus-estimators for g(r) as derivatives of the 
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Fig. 7. The comparison of the relative errors of the esti- 
mators for a poisson process with number density p = 200 
in the unit box: Co(r) (solid), Ci(r) (dotted), C*2(r) (short 
dashed), Cz(r) (long dashed), Ca{t) (short dashed-dotted, 
on top of Crj(r)), C$(r) (long dashed-dotted, on top of 
C (r)). 
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Fig. 9. The comparison of the relative errors of the esti- 
mators for a random distribution of points with number 
density p = 200 on line segments with lenght I — 0.1 and 
segment density p s = 20 in the unit box: Cb(r) (solid), 
Ci(r) (dotted), C 2 (r) (short dashed), C 3 (r) (long dashed), 
Ci(r) (short dashed-dotted), C5 (r) (long dashed-dotted, 
on top of Ci{r)). 



minus-estimators for the correlation integral. Therefore, 
gi(r) and g% (r) are ratio-unbiased in the limit A — > 0. 
Pietronero and coworkers use pi g\ (r) to estimate the con- 
ditional density T(r). 

3.3. Rivolo estimator for g(r) 

Kivolo (lysoj suggested a pair-weighted estimator, de- 
fined as: 



1 N 



nf(r) 



jy Z-/ 47r r 2A pi 

nf(r)/A 



(28) 



TV 2 ^ 



— ^ area(i9S r (xi) n P) 



For small A wc obtain 
ni (r) A 



A^0 



E 



8 D (r-\\xi-Xj\\) un(xi,\\xi-Xj\\), (29) 



with the Dirac distribution S D (s). On small and inter- 
mediate scales, the global weight ui g equals unity (see 
Eq. (|l3|)), and the Rivolo estimator converges for A — > 
towards the derivative of the ratio-unbiased Ripley esti- 
mator: 



A^0 



4?rr pi g 3 {r) — > — C 3 (r). 

dr 



(30) 



Hence, the Rivolo estimator is approximately unbiased for 
radii r were w g {r) = 1. 

3.^. TTie Fiksel and Ohser estimators for g(r) 



Fiksel [TUS5J introduced the following estimator for the 
two-point correlation function (see also Pons-Borderia et 
|al. 1998|) : 



~, x 1^1 



iV 



*E E 

1=1 j=l;^j 



[r.r+A](ll x '( x j 



IP 



47rr 2 A 7x>( x i - X,) ' 

if 7x>(xj — Xj) > for all ||xj — Xj|| < r. (31) 



With arguments presented in Sect. 3.3, this estimator can 
be derived from the corresponding estimator C±(r) for the 
correlation integral. 

Its isotropized counterpart §5 (r) is given by (see 
|Stoyan fc Stoyan 1994| and |Ohser fc Tscherny 1988] ) : 



95 (r) = 



|P| : 



JV 

E 



N 2 y5ir) ^ 4?rr 2 A 



for 72, (r) > 0. (32) 



|Ohscr fc Tscherny (1988)) use a kernel-based method in- 
stcad of nf-{r) (sec Sect. |7|). 
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4. Estimators for the two point correlation 
function g(r) based on DR and RR 

In the cosmological literature, estimators for g(r) are of- 
ten constructed by generating an additional set of ran- 
dom points. In the following we consider N T ^ Poisson dis- 
tributed points {yj}fl\, all inside the sample geometry 
yj G D, with the number density pTxT = j^. The set of 

the ./V data points (e.g. galaxies) is given by {xi}^ 1; as 
before. We employ the common notation, and define 



DD(r) 



N 



(33) 



the number of data-data pairs with a distance [r, r + A] ; 
pairs are counted twice. The number of data-random pairs 
with a distance [r, r + A] is denoted by 



N 



DR(r)=Y / *f(r), 



(34) 



and 



drt{r) = l[r,r+A](l|xi - Yj\ 
3=1 



(35) 



is the number of random points inside a shell with thick- 
ness A at a distance r from the data point x^. Similarly, 



RR(r) — ^2 rr t{ r ) (36) 
i=i 



is the number of random-random pairs with a distance 
[r, r + A]; pairs are counted twice. Finally, the number of 
random points inside a shell with thickness A at a distance 
r from the random point is given by 



rrf (r) 



N ld 

E V.r+Ajfllyi -yj\\)- 



(37) 



Firstly, we show that DR(r) and RR(r) are Monte- 
Carlo-versions of well defined geometrical quantities^]. 
Secondly, we rewrite the estimators using the pair-counts 
DD(r), DR(r), and RR(r), in terms of these geometric 
quantities and calculate the biases entering the pair-count 
estimators. 

4-1- The geometric interpretation of DR and RR 
For large N rr i and small A we obtain 

drf(r) = area(<9i3 r (x 4 ) n V) A, (38) 



4 These resu lts were independently derived by 3toyan & 
Stoyan (1998)|. 



and therefore 



N 



(39) 



DR{r) = p^A^ area(aB r (x 8 ) n V) 

i=l 

N 1 

= 47rr 2 A T^d Y] —, r 

1~! ^/(x^r) 

is proportional to the average inverse local weight tui (see 
Eq.|l|). 

To clarify the geometrical properties of RR(r) we re- 
write the set-covariance (Eq.([l7|)) as a Monte-Carlo inte- 
gration using iV r d random points € D. With AT r( j — » oo 
we obtain: 



w = ^E^(^- x ) 



(40) 



After angular averaging (see Eq.(|19j)) we insert an integral 
over the delta distribution 5 D : 



lv{r) 



W\ 



Nrd r2n 



E 



sin(6»)d6»d0 / dr' 
Jo 



YD 



xS D (r'-r) l D ( yi +x(r',M)) 

N ld 



x8 D (\\z i -y i \\-r) 



(41) 



The volume integral in the last line can be written as a 
Monte-Carlo integration: 



\D[ 



Nrd N It 



47rr 2 iV rd (iV rd - 1) ^ J- 



iyE E *i\\yt-yj\ 



3 = 1,3^1 



(42) 



For large N T d and small A this results in 

Iu 1 

Therefore RR(r) is proportional to the isotropized set- 
covariance. We summarize: 



i? J R(r)=47rr 2 A7^ d " 2 7^(r), 
N I 

DR(r) = 47rr 2 A pTJ V — ^ 



^ w;(x;,r)' 



(44) 
(45) 



4.2. The DD/RR estimator for g(r) 

Traditionally, the two-point correlation function is esti 
mated by DD/RR, 

N? d DD(r) 



9e(r) 



(46) 



N 2 RR(r) ' 

From Eq. (|3^) and ( f4~I| ) we see, that g~6(r) is a Monte 
Carlo version of the Ohscr estimator g~ 5 (r) , which is ratio 
unbiased for A — •> 0. 
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4-3. The Davis-Peebles estimator for g(r) 

Davis & Peebles (1983) popularized the DD/DR estima- 
tor, 



_ iV rd DD(r) 
57(r) = — DR^r) ■ 



(47) 



Landy 



Szalay (1993) have shown that this estimator is 

) and (|9|) gives 



biased. Rewriting g~e(r) with Eq. 



97(r) 



\V\ 
N 2 



E 



'(r)/A 



E 



N 



(48) 



l( x i. r ) 



A comparison with the Rivolo estimator (Eq. 

nf(r)/A 



\v\ — 



N 2 



E 



4 7Tr 2 



which is ratio-unbiased for A — > 0, reveals the geomet- 
rical nature of the bias. In g~r(r) the local weights u>i are 
replaced by an average over these local weights with the 
tacit assumption that the local weight for a sample point 
is independent of its relative position with respect to the 
boundary, which is unjustified. 
Let us consider the difference 



9r{r) - <f 3 (r) 



1 N 

- y 

N f-^ 



'-^ pi 4Trr 2 A 



-4(xz,r), 



(49) 



with 



A(xi,r) 



1 



1 



J, v 

N ^j=l ui (xf,r) 

Fig. [l(] displays the ensemble average of 



wi(xi,r). (50) 



1 



AT 



i=l 



(51) 



and illustrates the bias entering 57 (r). If we look at a 
clustered distribution with g(r) » 1, the bias is negligible 
on small scales. However, on large scales, we have a(r) > 
of order unity. Since for a stationary point process g(r) 
also approaches unity on large scales, the bias from <z(r) 
is important, and g(r) may be overestimated by g~j(r). 

Furthermore, nf'ir) and -A(xj, r) are not independent, 
and E [a(r)] may overestimate the true bias, but since 



(r) > and the term 



from Eq. (M9l) is of order 



pi 4?rr 2 A 

unity on large scales for a homogeneous point process, 
n A (r) and A(xj,r) have to conspire, to give E [57(7-) — 
#3(r)] = 0, if g>?{r) should be unbiased. 



0.2 
1 



Fig. 10. The average of a(r) over 10,000 realizations of 
a Poisson process with p = 200 (solid), a line segment 
process with number density p = 200, segment length I = 
0.3, and segment density p = 1 (dotted), 7j s = 3 (short 
dashed), p s = 5 (long dashed) , and 7^ = 10 (short dashed- 
dotted). 

4-4- The Landy-Szalay estimator for g(r) 

Landy & Szalay (1993) introduced a new estimator for the 
two- point correlation function (see also [Szapudi fc Szalay 
|1998D : 



^ N 2 d DD{rl N rd DR(r) 
gsv) — — , — ^2- 



iV 2 i?i?(r) AT RR{r) 



(52) 



By using Eq. (44) and ( |45| ) and the definition of ^(r) we 
have 



with 



b(r) = 



i(r) = 9h(r) - 2 6(r) + 2, 



jV rd gg(r) _ g Ei=i 
TV RR{r) ~ J5{r)/\V\ 



(53) 



(54) 



Since g§ (r) and equivalently gk(r) are ratio-unbiased for 
A —> 0, .98 (r) is approximately unbiased only if 

E $(r)] = 1. (55) 
For a Poisson process in a spherical window this can be 



verified from basic geometric considerations. Landy & Sza- 
lay (1993) showed that g~s(r) is ratio-unbiased for Pois- 



son and binomial processes in arbitrary windows. By def- 
inition, neither a Poisson process nor a binomial process 
show large-scale structures. To investigate the bias enter- 
ing g~s(r) we estimate E [b(r)] numerically for the highly 
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Fig. 11. The average of the bias 2 - 2b(r) over 10,000 
realizations of a Poisson process in the unit cube with p = 
200 (solid line), and line segment processes (see Sect. 2.5) 
with the same number density, the segment length / = 0.3, 
and mean number of lines per volume ~p s = 1 (dotted), 
p s = 3 (short dashed), p s = 5 (long dashed), and ~p s = 10 
(short dashed-dotted). 



structured line segment process. In Fig. [fl] a strong bias 
is visible if only few line segments are inside the sample 
geometry. When more and more structure elements enter, 
E [b(r)] tends towards unity. Similar to the properties of 
the DD/DR estimator, this bias b(r) is unimportant on 
small scales for a clustering process (with g(r) » 1), as 
provided by the galaxy distribution. However for a point 
distribution with structures on the size of the sample (see 
e.g. Huchra et al. 1990| ), b(r) introduces a bias towards 
higher values in <?s(r) on large scales. 



4-5. The Hamilton estimator for g(r) 

Hamilton (1993) suggested the following estimator: 

, DD(r)RR(r) 
" {r) = DR(rf ■ 

With Eq. ©, (H), ©, and (||) we obtain 

97 (r) 



59 0) 



b(r) 



(56) 



(57) 



The Hamilton estimator is unbiased only in the unlikely 
case where the biases from l/6(r) and g-j (r) cancel, . 

Stoyan & Stoyan (lyy^J found a negative bias in g~g (r) 
for a Poisson and a Matern cluster process. They at- 
tribute this to an inappropriate estimate of the density 
(see Sect. |]). A simulation of a Matern cluster process, 



gives a E [b(r)] « 1 as for the Poisson process, which sug- 
gests that mainly the same bias as in the Davis-Peebles 



estimator contributes (see also Landy & Szalay 1993) 



5. Improved estimators for C(r) and g(r) 



Recently, Stoyan & Stoyan (1998) proposed several im 



provements for ratio-unbiased estimators of point process 
statistics. 
With 



K (r) = p C(r), 



(58) 



the density of point-pairs with a distance smaller than r, 
ratio-unbiased estimators of the correlation integral C (r) 
may be written as 



C(r) 



K,(r) 
p{r) ' 



Unbiased estimates of n(r) are given by 

JV N 

«3(r)=X] H %V-](II X < - X 3'ID x 

i=l 



(59) 



(60) 



X W;(xi, Hxj - Xj 



\v\ 



N N 



l^4 r )=^2 V.rlfllXi — Xjll) 

1=1 j=l;jjii 
N N 



1 



727 (Xj - Xj) 
1 



727(||x l 



(61) 



(62) 



i=l j=l;j/i 

Using the unbiased estimate p\ — N/\T>\ of the density p in 



Eq. (59), we recover the ratio -unbiased estimators Ca(r) 
to^(r). 

Stoyan & Stoyan (iyy8) showed that one can do better. 
For stationary point processes they consider the following 
unbiased estimate of the density p, also depending on the 
scale r under consideration: 



J v d 3 x pv{*,r) ' 



Pv{r) 



(63) 



where py(x, r) is a non-negative weight function. For es- 
timators of Ripley's Kir) = C(r)/~p (see Appendix A), 
[Stoyan fc Stoyan (1998) employ the volume weight: 



Pv(x,r) 



|X>nB r (x)| 



(64) 



4tt/3 r 3 

For p = 3,4, 5 we define the improved ratio-unbiased es 
timators (r) for the correlation integral 

>v0) 



CUr) 



Pv(r)' 



(65) 



A numerical comparison, similar to the one performed in 
Sect. |2.5|, showed that the variance of the Ripley-estimator 
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Cs{r) is already equal to its improved counterpart C\{r). 

The improved estimators C|(r) and C|(r) now show the 
same variance as the Ripley-estimator, hence a smaller 
variance than the original estimators Ci(r) and C$(r). 
The biases do not change between the normal and the 
improved versions of the estimators. 

In close analogy to the estimators of the correlation 
integral, the estimators of the two-point correlation func- 
tion can also be improved. Consider the product density 
7 l( r ) = /°2(xi,X2) with r — ||xi — Xa|| (see Appendix A) 
then 



P 2 g(r) = v(r)- 



(66) 



30 - 



Ratio-unbiased estimators of the two-point correlation 
function g{r) may be written as 



p 2 (r) 



(67) 




The estimators r]^(r) may be defined in terms of the es- 
timators K^(r) (for details see [Stoyan fc Stoyan 1998 ): 



K^(r + A) - KfAx) 
4nr 2 A 



(68) 



An additional complication enters, since now we have to 
estimate p 2 , instead of p. Neither (p) 2 , nor p(p\T> \ — 1)/\T>\ 
give unbiased estimates of p 2 (the last one is unbiased for 
a Poisson process). 

Assuming the two-point correlation function g(r) to be 
known, 



Stoyan fc Stoyan (1998) 



showed that an unbiased 



estimate of p is given by 



N 

E 

»=1 2=1 d¥=i 



Pv(xi,r)py(xj,r) 
s(ll x * ™ x jll) 



(69) 



Stoyan fc Stoyan (1998) suggest a self-consistent it erative 



estimation of both g(r) and p . From simulations Stoyan 



25 30 

r [Mpc/h] 



Fig. 13. A comparison of the estimators on large scales 
for the southern part of the IRAS sample: Co(r) dotted 
line; Ci(r) short dashed line; C*2(r) long dashed line; Cs(r) 
dotted - short dashed line; C^r) dotted - long dashed 
line; C${r) short dashed - long dashed line. 

376 galaxies, separately. In Fig. [l2]we compare the correla- 
tion integral for the northern and southern parts with the 
correlation integral for a Poissonjorocess with the same 
number density, estimated with Co(r) to C${r). As ex- 
pected, Co(r) is biased towards lower values. The minus- 
estimators, especially C2(r), show large fluctuations on 
scales above 20/i _1 Mpc. On small scales out to 10/i _1 Mpc 
all the estimates C\(r) to C^(r) of the correlation inte- 
gral give nearly the same results and are clea rly above the 
Poisson result, indicating clustering. Already Lcmson and 



& StQ3 an (1998) infer that the estimators g% and g% are 
biased towards smaller values, in the case of a Poisson and 
a Matern cluster process. This bias is reduced in the corre- 
sponding improved estimators g\ and g l §. In their analysis 
p 2 was estimated by (ps{r)) 2 where instead of py(x, r) 
the surface weight 

ps(x,r) 
was used in Eq. (|63l). 



Sanders (1991) showed, that on small scales the depen- 



area(2?naB r (x)) 
47r r 2 



6. Correlation integral of IRAS galaxies 



(70) 



We ap ply the estimators for the correlation to a volume 

limited sample with 80/i _1 Mpc depth, of the IRAS 1.2 Jy 



dence of the conditional density T(r) on the chosen esti- 
mator is negligible. In Fig. |l3|we observe a strong scatter 
in the estimates of the correlation integral on large scales. 
The minus-estimators C\ (r) and Ci (r) deviate from C3 (r) 
to C${r); also the Ripley estimator Cs(r) gives different 
results, compared with C±{r) and C$(r). Whether the ob- 
served correlation integral becomes consistent with the 
correlation integral of a Poisson process on large scales, 
depends on the chosen estimator. 

Additionally to the differences between the estimators, 
we observe fluctuations in the correlation integral between 
the northern and the southern part (see also Martinez et 
al. 1998). Kerscher et al. (1998)| argued that these fluctu- 



redshift catalogue (Fisher et al. 1995). As suggested by the 



sample geometry, we analyse the northern part (galactic 
coordinates) with 412 galaxies, and the southern part with 



ations are real structural differences between the northern 
and southern part of the sample, observable out to scales 
of 200/i" 1 Mpc. 
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Fig. 12. Different estimates of the correlation integral for a volume limited sample of the IRAS 1.2 Jy galaxy catalogue 
with 80/i _1 Mpc depth are shown. The solid line marks the results for the northern, the dashed line the result for the 
southern part, the dotted area marks the ler range of a Poisson process with the same number density. 



6.1. A note on scaling 

The Fig. [lj displays the same data from Fig. P in a 
double logarithmic plot. With all estimators we observe 
C(r) oc r D , D w 2 within an approximate scaling regime^ 
up to at least 10/i _1 Mpc. Above 20/i~ 1 Mpc there seems 
to be a turnover towards D » 3. Sylos Labini et al. (1998)| 
argue, that this turnover is due to the sparseness of this 
galaxy catalogue. In the limit r — > the scaling exponent 



D is equal to the correlation dimension Di (Grassberger & 



Procaccia 1984). Clearly we find approximately the same 



scaling properties as Bylos Labini et al. (1998), who anal- 
ysed this IRAS sample, and a number of others, using 
minus estimators equivalent t o gl(r) and C\{r). Similar 



results have been obtained by [Martinez et al. (1998)| , who 
determined Ripleys K{r) = C(r)/~p with the Ripley— 
estimator, eqivalent to Cs(r), for a volume limited sam- 
ple with 120/i _1 Mpc depth. In their Figure 10 the scaling 



regime with D sa 2 extends out to « 15/i _1 Mpc, showing 
a turnover towards D«3on larger scales. 

In Fig. [[5] we observe that the number N r of galax- 
ies, more distant than r from the boundary of the sample 
window V (see Eq. (0)), becomes critically small, on scales 
larger than 20/i _1 Mpc. This leads to the fluctuations in 
the minus estimators. Likewise, the corrections from the 
weights in the estimators Cs(r) to C${r) become more and 
more important on scales larger than 20/i -1 Mpc. There- 
fore, it is not clear whether the trend towards a scaling 
exponent D « 3 on large scales is a true physical one, or 
a result of the weighting schemes used. 

A lively debate on the extent of the scaling regime is 
going on. See for example the Princeton discussion be- 
Davis (1996)1 and |Pictroncro et al. (1996)|, th e dis- 



tween 



cussions at the Ringberg meeting (Bender et al. 1997), and 
more recently |Guzzo (1997), |5ylos Labini et al. (1998) 



5 As fae will argnp hplnw a correlation dimpntinn Dn rannnt 

be reliably extracted from a scaling regime with roughly one 
and a half decades. Therefore, we do not perform a numerical 
fit to estimate D2- 



1998 



[McCauley (1998)| , |Martmez et al. (1998)| , and |Wu et al 



We want to emphasize, that two-point measures are 
insensitive to structures on large scales (see the examples 
in Bzalay 1997 and Kerscher 1998 ). Therefore, the possible 
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Fig. 14. A double logarithmic plot of different estimates of the correlation integral, for the northern part (solid line) 
and the southern part (long dashed line), together with functions proportional to r 2 (dotted line) and r 3 (short dashed 
line) . 



observation of C(r) oc r 3 on large scales does not imply, 7. Remarks 



that we are looking at Poisson distributed points. 

On the other hand, an estimate of the correlation 
dimension D2 from one and a half decades only is er- 
ror prone. To illustrate this, we calculate local scaling 

sec Eq. |Stoyan fc| 



exponents vi from jVji 



Stoyan | 1994 [Borgani 199J and [McCaulcy 1997j ). We re- 
strict ourselves to 167 points x/ with a distance larger than 
12.5/i _1 Mpc from the boundary of the window to deter- 
mine Ni(r). vi is estimated using a linear regression of 
log(A/(r)) against log(r). The frequency histogram of the 
local scaling exponents peaks at v i=a 2, consistent with 
C(r) oc r 2 on small scales out to 10/i _1 Mpc, but shows 
a large scatter (Fig. |l6|). A constant scaling exponent D 
may be identified with the correlation dimension D2 only, 
if the scaling regime of the correlation integral extends 



over several decades. In Grassberger & Procaccia (1984) 



Fig. 3 a scaling over 15 decades is observed, as an unam- 



biguous trace of fractality. McCauley (1997) addresses the 



problem of a limited scaling regime in the estimation of 
(multi-) fractal dimensions in detail. 



— The qualitative interpretation of clustering properties 
is easier with the two-point correlation function than 
with the correlation integral. However, the necessary 
binning in the estimators of the two-point correlation 
function may give misleading results, whereas a quanti- 
tative analysis with the correlation integral is straight- 
forward. 

— Estimates of the two-point correlation function g{r) 
may be impaired by shot-noise, due to the finite bin- 
ning A. However, no binning is needed in the correla- 
tion integral, a shot-noise contribution is visible only 
at small scales, if at all. 

— Sometimes kernel-based methods are used for the de- 
termination of the two-point correlation function. The 
number of points in a shell nf(r) (Eq. (p4|)) is replaced 
by 
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scales we have u> g > 1 and both estimators underesti- 
mate the two-point correlation function. 
On small scales, the weights used in the estimators of 
the correlation integral and the two-point correlation 
function converge towards unity and the biases a and 
b converge towards zero and unity, respectively. There- 
fore, all estimators of the correlation integral and the 
two-point correlation function give the same results on 
small scales. However, the quadratic pole at zero in the 
estimators j 3 to j 5 gives rise to biases on very small 
scales (see [Btoyan fc Stoyan 1994 and Pons Bordcria 
et al. 199^ ). 



10 20 

r [fr 1 Mpc] 



Fig. 15. The number N r of galaxies with a distance larger 
than r to the boundary of the sample window £>, for the 
northern part (solid line) and the southern part (dashed 
line). 



o 

! 

Er 

3. 



Only finite-size corrections were discussed. Therefore, 
the estimators described are applicable to complete or 
volume-limited samples. Usually a correction for sys- 
tematic incompleteness effects in magnitude limited 
catalogues is performed by weighting with the inverse 
selection function (see e.g. Martinez 1996 ). This relies 
on the assumption, that the clustering properties of 
galaxies are independent of their absolute magnitude. 
Estimators for the n-point correlation functions, simi- 
lar to the Landy-Szalay- and Hamilton-estimators for 
the two-point correlation function were introduced by 
Bzapudi fc Szalay (1998)1 and |Jing fc Borner (1998)| . It 
is not clear, whether the biases found in the estimators 
for the two-point correlation function are also present 
in the related estimators for the n-point correlation 
function. Unbiased estimators for the n-th moment 



measures are discussed by Hanisch (1983) 



The fit of a straight line to the log-log plot of the 
non parametric estimate of the correlation integral is 
only one way to determine the scaling properties of 
the point distribution. Maximum likelihood methods 



are discussed by Pgata fc Katsura (1991 



The attribute (ratio-) unbiased of an estimator makes 
sense only for a stationary point processes. We em- 
phasize that stationarity (i.e. homogeneity) is a model 
assumption. It is not possible to test global stationar- 
ity in an objectiv e way with one realization only. See 
Mathcron (1989) for a detailed discussion of the prob- 
lems inherent in a statistical analysis of one data set. 



8. Conclusions and recommended estimators 



Fig. 16. The frequency distribution of the local scaling 
exponents v\. 

where &a is a kernel function of width A, satisfying 
k&.(r) = k\{—r) > and f_ dr k\(r) — 1 (see 
e.g. |Stoyan fc Stoyan 1994] and jPons-Borderia et al. 
199§). 

— In the Rivolo- and in the biased Davis-Peebles-esti- 
mator we have set the global weight lo cj to unity, which 
is correct for small and intermediate scales. On larger 



In this article we are concerned with the geometrical na- 
ture of the two-point measures. As a starting point we 
discussed several well-known, ratio-unbiased estimators 
of the correlation integral. From two examples we saw 
that all the estimators could reproduce the theoretical 
mean values of the correlation integral for a Poisson pro- 
cess and with a small negative bias for a line segment pro- 
cess. The estimators using weighting schemes show a small 
variance, whereas the variance of the minus-estimators 
becomes prohibitive on large scales. We investigated the 
close relation of the geometrical estimators of the two- 
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point correlation function g(r) with the estimators of the 
correlation integral C(r). 

Expressing the pair-counts DR and RR in terms of 
geometrical quantities enabled us to calculate the biases 
entering the Davis-Peebles-estimator, the Landy-Szalay- 
estimator, and the Hamilton-estimator. With simulations 
of a structured point process we have quantified these bi- 
ases: on small scales they are unimportant in the analysis 
of clustered galaxies. However, on large scales the biases 
are not negligible, especially when only a few structure 
elements like filaments or sheets are inside the sample. 

As a real-life example we applied the estimators to a 
volume limited sa mple extracted from the IRAS 1.2 Jy 
galaxy catalogue ( Fisher ct al. 1995 ) with 80/i _1 Mpc 
depth. On scales up to 10/i _1 Mpc the estimators Ci(r) 
to C$(r) gave nearly identical results, and the shape of 
C(r) is well determined in the northern and southern 
part (galactic coordinates) of the sample separately. How- 
ever, on scales larger than 20/i~ 1 Mpc the results differ, 
not only between the minus-estimators and the estimators 
using weighting schemes, but also between ratio-unbiased 
estimators using different weighting schemes. In a scal- 
ing analysis we found a C(r) oc r D with D m 2 up to 
10/i —1 Mpc, and a possible turnover to D w 3 on scales 
larger than 20/i _1 Mpc. However, the extent of this scal- 
ing regime cannot be reliably determined from this galaxy 
sample. Since the scaling regime is roughly one and a half 
decades only, an estimate of the correlation dimension D2 
from the scaling exponent D is unreliable. The large scat- 
ter seen in the local scaling exponents reflects this un- 
certainty. We could al so confirm the fluctuations found by 
Kerscher et al. (1998) in the clustering properties between 



the northern and southern parts of the sample (see also 
Martinez et al. 1998|). 



These large scale fluctuations and the differences in 
the estimated correlation integral suggest that we have to 
wait for the next generation surveys, like the SDSS and 
the 2dF, if we want to determine the two-point measures 
of galaxies unambiguously on large scales. 

8.1. Recommended estimators 

A general recommendation is that one should compare the 
results of at least two estimators. Since estimators of the 
two-point correlation function are ratio-unbiased only in 
the impracticable limit of zero bin width, statistical tests 
should be based on integral quantities like the correlation 
integral C(r) or the L(r) function defined in Appendix A. 

On small scales the weights in the estimators converge 
towards unity and also the biases become negligible for 
the clustered galaxy distribution. This is confirmed by our 
analysis of the IRAS 1.2 Jy catalogue, where all estimators 
of C(r) give nearly the same results on scales smaller than 
10/i -1 Mpc. Therefore, on small and intermediate scales, 
i.e. on scales where £2(7*) is of the order or larger than 



unity, the best estimator is the one with the smallest vari- 
ance and the smallest bias. For the correlation integral this 
is the Ripley estimator C3 . For complicated sample geome- 
tries the numerical implementation of C4 is simpler than 
C3 (see Appendix B). Additionally, the variance of C4 is 
only slightly increased, and the assumption of isotropy is 
not entering the construction of this estimator. 

On large scales, i.e. on scales with £2(7") < 1, the 
comparison of the results obtained with different ratio- 
unbiased estimators may serve as an internal consistency 
check, and provide conclusive means to judge the relia- 
bility of the estimates. The scale at which significant dif- 
ferences between ratio-unbiased estimators are found can 
be used to define a scale of reliability for the sample un- 
der consideration. In particular, a comparison between the 
Ripley-estimator C3 and the Ohser-Stoyan-estimator C4 
is useful, since the assumption of isotropy does not enter 
the construction of the Ohser-Stoyan estimator. A final 
comparison with the minus-estimator C\ can illustrate 
the reliability of the results on large scales. 

Guided by our analysis of estimators for the correla- 
tion integral we expect that the estimators of the two- 
point correlation function behave similar and that one 
may use either the Rivolo 53, or the Fiksel estimator gl 
on small and intermediate scales. On very small scales the 
quadratic pole at zero in the estimators (73 to can lead 



to biases (Stoyan & Stoyan 1994). No estimator of the cor- 
relation integral is impaired by this pole. Similarly to the 
correlation integral a comparison of the Rivolo-estimator 
g~3, the Fiksel-estimator 54, and the minus-estimator g\ 
may serve as an internal consistency check on large scales. 

In Sect. [| we discussed how biases enter the Davis- 
Peebles estimator 57, the Landy-Szalay estimator g~g, and 
the Hamilton estimator g~g for arbitrary stationary and 
isotropic point processes. We quantified them for a line 
segment process. The relevance of these biases in cosmo- 
logical situations, and a comparison of the variances of the 
pair-count estimators with the variances of the geometri- 
cal estimators will be subject of future work. 
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Appendix A: Two point measures 

In this appendix we summarize common two-point mea- 
sures. 

The product density^ 

p 2 (x 1 ,x 2 )dF(x 1 )dy(x 2 ) (A.l) 

is the probability to find a point in the infinitesimal 
volume dl^(xi) and in dl^(x2). In the following we as- 
sume that the point process is stationary and isotropic, 
hence the statistical properties of ensemble averages do 
not depend on the specific location and orientation in 
space. In this case /52(xi,X2) only depends on the distance 
r = ||xi — X2H of the two points: 

p 2 g(r) = p 2 (l+£ 2 (r)) =p 2 ( Xl ,x 2 ). (A.2) 

The correlation integral C(r) — ds p ATrs 2 g(s) is related 
to Ripley's i^-function (also known as the reduced second 
moment measure ( [Stoyan ct al. 1995 )) by 

C(r)=pK{r). (A.3) 

For statistical test, often the L(r) function is used: 

Care has to be taken, since the definition of L(r) is not 
unique throughout the literature. In some applications the 
integrated normed cumulant is considered: 



Mr) 



ds s 2 Us)- 



(A.5) 



6 In the statistical literature the product density P2(xi, X2) is 
defined as the Lebesgue density o f the second factorial moment 



measure (e.g. Stoyan et al. 1995) 
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Clearly, C(r) = p^- r 3 +p Air J%{r). Coleman & Pietroncro 



(1992)| 



use 

r * (r) = ^3 ^ dT (r)=P9(r). (A.6) 

Another common tool is the variance of cell counts. We 
are interested in the fluctuations in the number of points 
N(C) in a spatial domain C. The variance of N(C) is given 
by (see e.g. Stoyan fc Stoyan 1994 ) 

Y[N(C)} = E [N(C) 2 } - E [N(C)} 2 = (A.7) 

= / d d Xl f d d a; 2 p 2 (x 1 ,x 2 )+p|C|-(p|C|) 2 . 
Jc Jc 

E is the ensemble average, i.e. the average over different 
realizations. For a Poisson process V[7V(C)] = ~p\C\. Also 
o~(r) 2 , the fluctuations in excess of Poisson inside a sphere 
B r with radius r are considered: 



W[N(B r )}=p\B r \+a(rY(p\B r \) 



(A. 



Hence, 



a{rf 



/ / dWy^dlx-yl 

Prl J B r J B r 

C(r) 



p\B r 



1 = (L(r))" - 1. 



(A.9) 



Often spectral methods are used. The power spectrum 
can be defined as the Fourier transform of the normcd 
cumulant £ 2 : 

P(k) = -i- / d 3 xe- k ' x 6(l|x||). (A.10) 



Newman ct al. (1994) discuss problems in the estimation 
of the power-spectrum. 

Appendix B: Implementation 

In this Appendix we give a short description of the imple- 
mentation of the estimators. 

B.l. Minus estimators: 

The main computational problem is to determine the dis- 
tance from a galaxy to the boundary of the sample, or 
equivalently, whether the galaxy is inside the shrunken 
window 2?_ r . No general recipe is available and the imple- 
mentation depends on the specific survey geometry under 
consideration. 

B.2. Ripley and Rivolo estimators 

For both the Ripley and the Rivolo estimators C3 and (73 
we have to calculate the local weight Eq. (fl4|): 

, ) = UAnD) for^.(xOnP^0, 
y 11 ' }0 for dB s (xi)nV = $, 



is inversely proportional to the part of the surface of a 
sphere with radius s drawn around the point Xi which is 
inside the survey geometry V (see Fig ||). For a cuboid 
sample Baddeley et al. (1993) give explicit expressions. 
In our calculations we discretized the sphere B s (xi), and 
approximated W/(Xj,s) by the inverse fraction of surface 
elements inside the sample geometry T>. Equivalently, a 
random distribution of points on the sphere may be used. 
In both approaches, we need a fast method to determine if 
a point is inside the sample T>. Rivolo (1986) suggested to 
count the number of random points inside T> in the shell 
of radius s and width A around Xj to estimate 

area(9S s (xi) n V) A. 

Explicit expressions for the global weight 



|{xeP I dB s {x.)nv^ 



for a cuboid sample can be found in Baddclcy ct al. (1993) 



For more general sample geometries it seems necessary to 
use Monte-Carlo methods. In our calculations we consid- 
ered only radii s for which to g (s) = 1 is fulfilled. 



B.3. Ohser-Stoyan type estimators 

For the Ohser-Stoyan C4 and the Fiksel estimator 174, we 
have to calculate the set-covariance Eq. (|l^) : 

72? (x) = |PnP + x|. 

We obtain for a vector x = (2, y, z) and a cuboid sample 
with side lengths L x > \x\,L y > \y\,L z > \z\ 

7c (x) = (L x - \x\){L y - \y\)(L, - \z\), (B.l) 

and for a spherical sample with Radius R and r = ||x|| < R 



4ir 

7i? W = y 
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(B.2) 



For more general sample geometries one has to rely on 
a Monte-Carlo method: draw random points inside T> 
and estimate 7 d(x)/|2?| by the fraction of points y^ + x 
inside the sample T>. 

The isotropized set-covariance 7 f5{r) used in the es- 
timators C5 and 55 can be calculated from 72? (x). For a 
cuboid sample we obtain 



lv (r) 



L x L y L z 



-(L X Ly - 



L X L, 



LyL z 



Ly + L z ) — — , (B.3) 



4tt 



and for a spherical sample 72?(||x||) = 71? (x). Again, 
for more general sample geometries one has to rely on 
a Monte-Carlo method: consider randomly distributed 
points y; inside D and unit vectors Uj randomly dis- 
tributed on the sphere. Now estimate Jv{r)/\V\ by the 
fraction of points y, + rUj inside the sample T>. Another 
possibility is to use the number of random point pairs 
RR(r) inside the sample to estimate 47rr 2 A p^j 2 7z?(r) 
according to Eq. (E3). 
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B.4. DD, DR(r), and RR: 

DD(r), DR(r), and RR(r) are the number of data-data, 
data-random, and random-random point pairs inside T> 
with a distance in [r,r + A]. Point pairs in DD(r) and 
RR(r) are counted twice. Care has to be taken to use 
enough random points AT r( j: 

For small r the value of the isotropized set-covariance 
7x>(r) is close to \T>\, and RR(r) has to be approximately 
4Trr 2 ANf d /\V\ (see Eq. (|||)). If significant deviations for 
small r occur, the number of random points N T a should 
be increased. 

At small scales the local weight cj/(xj,r) equals unity 
for most of the points Xj. From Eq. ( p5| ) we recognize 
that DR(r) is approximately 47rr 2 AA r rc iA r /|2?|. Again a 
significant deviation indicates that more random points 
are needed. 



